Accelerating AI inferencing with external KV Cache on Managed Lustre
cloud.google.comยท4h
๐Model Serving Economics
Flag this post
Run Multimodal Reasoning Agents with NVIDIA Nemotron on vLLM
blog.vllm.aiยท20h
๐ง LLM Inference
Flag this post
Anyone else running their whole AI stack as Proxmox LXC containers? Im currently using Open WebUI as front-end, LiteLLM as a router and A vLLM container per mod...
๐คAI
Flag this post
๐ง ๐ Excited to introduce Supervised Reinforcement Learningโa framework that leverages expert trajectories to teach small LMs how to reason through hard problems ...
threadreaderapp.comยท18h
๐New AI
Flag this post
KAITO and KubeFleet: Projects Solving AI Inference at Scale
thenewstack.ioยท3h
๐ง Inference Serving
Flag this post
MITโs Survey On Accelerators and Processors for Inference, With Peak Performance And Power Comparisons
semiengineering.comยท3h
๐FAISS
Flag this post
Links for October 2025
eamag.meยท20h
๐ชPrompt Engineering
Flag this post
Tencent/WeKnora
github.comยท18h
๐Meilisearch
Flag this post
Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering
arxiv.orgยท16h
๐BGE Embeddings
Flag this post
Show HN: GPU-accelerated sandboxes for running AI coding agents in parallel [video]
๐ฅGPUs
Flag this post
๐ฒ On LLMs
kaukas.mataroa.blogยท11h
๐ชPrompt Engineering
Flag this post
Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท11h
๐ก๏ธAI Safety
Flag this post
AI model identifies high-performing battery electrolytes by starting from just 58 data points
techxplore.comยท23h
๐New AI
Flag this post
ClairS-TO: a deep-learning method for long-read tumor-only somatic small variant calling
nature.comยท5h
๐ฏQdrant
Flag this post
Using Vision Language Models to Process Millions of Documents
pub.towardsai.netยท22h
๐ง LLM Inference
Flag this post
Loading...Loading more...